Dirty Statistical Models

نویسندگان

  • Eunho Yang
  • Pradeep Ravikumar
چکیده

We provide a unified framework for the high-dimensional analysis of“superposition-structured” or “dirty” statistical models: where the model param-eters are a superposition of structurally constrained parameters. We allow for anynumber and types of structures, and any statistical model. We consider the gen-eral class of M -estimators that minimize the sum of any loss function, and aninstance of what we call a “hybrid” regularization, that is the infimal convolutionof weighted regularization functions, one for each structural component. We pro-vide corollaries showcasing our unified framework for varied statistical modelssuch as linear regression, multiple regression and principal component analysis,over varied superposition structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ActiveClean: Interactive Data Cleaning For Statistical Modeling

Analysts often clean dirty data iteratively–cleaning some data, executing the analysis, and then cleaning more data based on the results. We explore the iterative cleaning process in the context of statistical model training, which is an increasingly popular form of data analytics. We propose ActiveClean, which allows for progressive and iterative cleaning in statistical modeling problems while...

متن کامل

Data Cleaning using Probabilistic Models of Integrity Constraints

In data cleaning, data quality rules provide a valuable tool for enforcing the correct application of semantics on a dataset. Traditional rule discovery techniques assume a reasonably clean dataset, and fail when faced with a dirty one. Enforcement of these rules for error detection is much less effective when mined on dirty data. In the databases literature, a popular and expressive type of lo...

متن کامل

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, out-of-date, or outlier data. Identifying dirty data is often a manual and iterative process, and can be challenging on large datasets. However, many data cleaning workflows can introduce subtle biases into the training pro...

متن کامل

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models

In this paper, we develop a family of algorithms for optimizing “superpositionstructured” or “dirty” statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid regularization. Most of the current approaches are first-order methods, including proximal gradient or Alternating Direction Method of Multipliers (ADMM). We propose...

متن کامل

Impacts of Dirty Data: and Experimental Evaluation

Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013